Spatial scalable compression

ABSTRACT

A method and apparatus for providing spatial scalable compression using adaptive content filtering of a video stream is disclosed. The video stream is downsampled to reduce the resolution of the video stream. The downsampled video stream is encoded to produce a base stream. The base stream is decoded and upconverted to produce a reconstructed video stream. The reconstructed video stream is subtracted from the video stream to produce a residual stream. The resulting residual stream is encoded in an enhancement encoder and outputs an enhancement stream. The residual signal in selected frames is muted in the enhancement encoder while the motion information in the frame is maintained.

FIELD OF THE INVENTION

The invention relates to spatial scalable compression schemes.

BACKGROUND OF THE INVENTION

Because of the massive amounts of data inherent in digital video, thetransmission of full-motion, high-definition digital video signals is asignificant problem in the development of high-definition television.More particularly, each digital image frame is a still image formed froman array of pixels according to the display resolution of a particularsystem. As a result, the amounts of raw digital information included inhigh resolution video sequences are massive. In order to reduce theamount of data that must be sent, compression schemes are used tocompress the data. Various video compression standards or processes havebeen established, including, MPEG-1, MPEG-2, MPEG-4, H.263, and H264.

Many applications are enabled where video is available at variousresolutions and/or qualities in one stream. Methods to accomplish thisare loosely referred to as scalability techniques. There are three axeson which one can deploy scalability. The first is scalability on thetime axis, often referred to as temporal scalability. Secondly, there isscalability on the quality axis, often referred to as signal-to-noisescalability or fine-grain scalability. The third axis is the resolutionaxis (number of pixels in image) often referred to as spatialscalability or layered coding. In layered coding, the bitstream isdivided into two or more bitstreams, or layers. Each layer can becombined to form a single high quality signal. For example, the baselayer may provide a lower quality video signal, while the enhancementlayer provides additional information that can enhance the base layerimage.

In particular, spatial scalability can provide compatibility betweendifferent video standards or decoder capabilities. With spatialscalability, the base layer video may have a lower resolution than theinput video sequence, in which case the enhancement layer carriesinformation which can restore the resolution of the base layer to theinput sequence level.

Most video compression standards support spatial scalability. FIG. 1illustrates a block diagram of an encoder 100 which supportsMPEG-2/MPEG-4 spatial scalability. The encoder 100 comprises a baseencoder 112 and an enhancement encoder 114. The base encoder iscomprised of a low pass filter and downsampler 120, a motion estimator122, a motion compensator 124, an orthogonal transform (e.g., DiscreteCosine Transform (DCT)) circuit 130, a quantizer 132, a variable lengthcoder 134, a bitrate control circuit 135, an inverse quantizer 138, aninverse transform circuit 140, switches 128, 144, and an interpolate andupsample circuit 150. The enhancement encoder 114 comprises a motionestimator 154, a motion compensator 155, a selector 156, an orthogonaltransform (e.g., Discrete Cosine Transform (DCT)) circuit 158, aquantizer 160, a variable length coder 162, a bitrate control circuit164, an inverse quantizer 166, an inverse transform circuit 168,switches 170 and 172. The operations of the individual components arewell known in the art and will not be described in detail.

Unfortunately, the coding efficiency of this layered coding scheme isnot very good. Indeed, for a given picture quality, the bitrate of thebase layer and the enhancement layer together for a sequence is greaterthan the bitrate of the same sequence coded at once.

FIG. 2 illustrates another known encoder 200 proposed by DemoGrafx. Theencoder is comprised of substantially the same components as the encoder100 and the operation of each is substantially the same so theindividual components will not be described. In this configuration, theresidue difference between the input block and the upsampled output fromthe upsampler 150 is inputted into a motion estimator 154. To guide/helpthe motion estimation of the enhancement encoder, the scaled motionvectors from the base layer are used in the motion estimator 154 asindicated by the dashed line in FIG. 2. However, this arrangement doesnot significantly overcome the problems of the arrangement illustratedin FIG. 1.

While spatial scalability, as illustrated in FIGS. 1 and 2, is supportedby the video compression standards, spatial scalability is not oftenused due to a lack of coding efficiency. The lack of efficient codingmeans that, for a given picture quality, the bit rate of the base layerand the enhancement layer for a sequence together are more than the bitrate of the same sequence coded at once.

SUMMARY OF THE INVENTION

It is an object of the invention to overcome the above-describeddeficiencies of the known spatial scalability schemes by providing moreefficient spatial scalable compression schemes by slightly reducing thepicture quality in every other picture frame.

According to one embodiment of the invention, a method and apparatus forproviding spatial scalable compression using adaptive content filteringof a video stream is disclosed. The video stream is downsampled toreduce the resolution of the video stream. The downsampled video streamis encoded to produce a base stream. The base stream is decoded andupconverted to produce a reconstructed video stream. The reconstructedvideo stream is subtracted from the video stream to produce a residualstream. The resulting residual stream is encoded in an enhancementencoder and outputs an enhancement stream. Information in selectedframes is muted in the enhancement encoder.

According to another embodiment of the invention, a method and apparatusfor decoding compressed video information received in a base stream andan enhancement stream is disclosed. The base stream is decoded and thenupconverted to increase the resolution of the decoded base stream. Theencoded frames are decoded in the enhancement stream to create a firstdecoded enhancement stream. The upconverted decoded base stream iscombined with the enhancement stream to produce a video output. Inaddition, a second decoded enhancement stream is generated for emptyframes in the received enhancement stream using a temporal interpolationalgorithm. The first and second decoded enhancement streams areinterleaved to create an interleaved enhancement stream.

These and other aspects of the invention will be apparent from andelucidated with reference to the embodiments described hereafter.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention will now be described, by way of example, with referenceto the accompanying drawings, wherein:

FIG. 1 is a block schematic representation of a known encoder withspatial scalability;

FIG. 2 is a block schematic representation of a known encoder withspatial scalability;

FIG. 3 is a block schematic representation of an encoder with spatialscalability according to one embodiment of the invention;

FIG. 4 is a block schematic representation of a layer decoder fordecoding a video stream from the encoder illustrated in FIG. 3 accordingto one embodiment of the invention;

FIG. 5 is a block schematic representation of an encoder with spatialscalability according to another embodiment of the invention; and

FIG. 6 is a block schematic representation of decoder for decoding avideo stream from the encoder illustrated in FIG. 5 according to oneembodiment of the invention.

DETAILED DESCRIPTION OF THE INVENTION

According to one embodiment of the invention, at least some frameinformation is muted in the enhancement encoder. For example, motionvectors are encoded instead B-frames. Since B-frames are not used forconsecutive predictions, these frames can easily be left out. However,the result of leaving these frames out is an unacceptable loss inpicture quality due to the fact that one can clearly see the change inresolution and sharpness in every other frame. These problems can beovercome by coding the motion vectors instead of the complete B-frame onthe enhancement layer as will be described below. By inserting emptyB-frames into the enhancement encoder, a reduction in the size of theenhancement layer can be obtained.

FIG. 3 is a schematic diagram of an encoder according to one embodimentof the invention. It will be understood that this is an illustrativeexample of an encoder which can be used to implement the invention, andother encoders can also be used to implement the invention. The depictedencoding system 300 accomplishes layered compression, whereby a portionof the channel is used for providing a low resolution base layer and theremaining portion is used for transmitting enhancement information,whereby the two signals may be recombined to bring the system up to highresolution.

The encoder 300 comprises a base encoder 312 and an enhancement encoder314. The base encoder is comprised of a low pass filter and downsampler320, a motion estimator 322, a motion compensator 324, an orthogonaltransform (e.g., Discrete Cosine Transform (DCT)) circuit 330, aquantizer 332, a variable length coder (VLC) 334, a bitrate controlcircuit 335, an inverse quantizer 338, an inverse transform circuit 340,switches 328, 344, and an interpolate and upsample circuit 350.

An input video block 316 is split by a splitter 318 and sent to both thebase encoder 312 and the enhancement encoder 314. In the base encoder312, the input block is inputted into a low pass filter and downsampler320. The low pass filter reduces the resolution of the video block whichis then fed to the motion estimator 322. The motion estimator 322processes picture data of each frame as an I-picture, a P-picture, or asa B-picture. Each of the pictures of the sequentially entered frames isprocessed as one of the I-, P-, or B-pictures in a pre-set manner, suchas in the sequence of I, B, P, B, P, . . . , B, P. That is, the motionestimator 322 refers to a pre-set reference frame in a series ofpictures stored in a frame memory not illustrated and detects the motionvector of a macro-block, that is, a small block of 16 pixels by 16 linesof the frame being encoded by pattern matching (block Matching) betweenthe macro-block and the reference frame for detecting the motion vectorof the macro-block.

In MPEG, there are four picture prediction modes, that is anintra-coding (intra-frame coding), a forward predictive coding, abackward predictive coding, and a bi-directional predictive-coding. AnI-picture is an intra-coded picture, a P-picture is an intra-coded orforward predictive coded or backward predictive coded picture, and aB-picture is an intra-coded, a forward predictive coded, or abidirectional predictive-coded picture.

The motion estimator 322 performs forward prediction on a P-picture todetect its motion vector. Additionally, the motion estimator 322performs forward prediction, backward prediction, and bi-directionalprediction for a B-picture to detect the respective motion vectors. In aknown manner, the motion estimator 322 searches, in the frame memory,for a block of pixels which most resembles the current input block ofpixels. Various search algorithms are known in the art. They aregenerally based on evaluating the mean absolute difference (MAD) or themean square error (MSE) between the pixels of the current input blockand those of the candidate block. The candidate block having the leastMAD or MSE is then selected to be the motion-compensated predictionblock. Its relative location with respect to the location of the currentinput block is the motion vector.

Upon receiving the prediction mode and the motion vector from the motionestimator 322, the motion compensator 324 may read out encoded andalready locally decoded picture data stored in the frame memory inaccordance with the prediction mode and the motion vector and may supplythe read-out data as a prediction picture to arithmetic unit 325 andswitch 344. The arithmetic unit 325 also receives the input block andcalculates the difference between the input block and the predictionpicture from the motion compensator 324. The difference value is thensupplied to the DCT circuit 330.

If only the prediction mode is received from the motion estimator 322,that is, if the prediction mode is the intra-coding mode, the motioncompensator 324 may not output a prediction picture. In such asituation, the arithmetic unit 325 may not perform the above-describedprocessing, but instead may directly output the input block to the DCTcircuit 330 through switch 338. In such a situation, the I-frames areforwarded to the DCT circuit 330.

The DCT circuit 330 performs DCT processing on the output signal fromthe arithmetic unit 325 so as to obtain DCT coefficients which aresupplied to a quantizer 332. The quantizer 332 sets a quantization step(quantization scale) in accordance with the data storage quantity in abuffer (not illustrated) received as a feedback and quantizes the DCTcoefficients from the DCT circuit 330 using the quantization step. Thequantized DCT coefficients are supplied to the VLC unit 334 along withthe set quantization step.

The VLC unit 334 converts the quantization coefficients supplied fromthe quantizer 332 into a variable length code, such as a Huffman code,in accordance with the quantization step supplied from the quantizer332. The resulting converted quantization coefficients are outputted toa buffer not illustrated. The quantization coefficients and thequantization step are also supplied to an inverse quantizer 338 whichdequantizes the quantization coefficients in accordance with thequantization step so as to convert the same to DCT coefficients. The DCTcoefficients are supplied to the inverse DCT unit 340 which performsinverse DCT on the DCT coefficients. The obtained inverse DCTcoefficients are then supplied to the arithmetic unit 348.

The arithmetic unit 348 receives the inverse DCT coefficients from theinverse DCT unit 340 and the data from the motion compensator 324depending on the location of switch 344. The arithmetic unit 348 sumsthe signal (prediction residuals) from the inverse DCT unit 340 to thepredicted picture from the motion compensator 324 to locally decode theoriginal picture. However, if the predition mode indicates intra-coding,the output of the inverse DCT unit 340 may be directly fed to the framememory. The decoded picture obtained by the arithmetic unit 340 is sentto and stored in the frame memory so as to be used later as a referencepicture for an inter-coded picture, forward predictive coded picture,backward predictive coded picture, or a bi-directional predictive codedpicture.

The enhancement encoder 314 comprises a motion estimator 354, a motioncompensator 356, a DCT circuit 368, a quantizer 370, a VLC unit 372, abitrate controller 374, an inverse quantizer 376, an inverse DCT circuit378, switches 366 and 382, subtractors 358 and 364, and adders 380 and388. In addition, the enhancement encoder 314 may also includeDC-offsets 360 and 384, adder 362 and subtractor 386. The operation ofmany of these components is similar to the operation of similarcomponents in the base encoder 312 and will not be described in detail.

The output of the arithmetic unit 348 is also supplied to the upsampler350 which generally reconstructs the filtered out resolution from thedecoded video stream and provides a video data stream havingsubstantially the same resolution as the high-resolution input. However,because of the filtering and losses resulting from the compression anddecompression, certain errors are present in the reconstructed stream.The errors are determined in the subtraction unit 358 by subtracting thereconstructed high-resolution stream from the original, unmodified highresolution stream.

According to one embodiment of the invention illustrated in FIG. 3, theoriginal unmodified high-resolution stream is also provided to themotion estimator 354. The reconstructed high-resolution stream is alsoprovided to an adder 388 which adds the output from the inverse DCT 378(possibly modified by the output of the motion compensator 356 dependingon the position of the switch 382). The output of the adder 388 issupplied to the motion estimator 354. As a result, the motion estimationis performed on the upscaled base layer plus the enhancement layerinstead of the residual difference between the original high-resolutionstream and the reconstructed high-resolution stream. This motionestimation produces motion vectors that track the actual motion betterthan the vectors produced by the known systems of FIGS. 1 and 2. Thisleads to a perceptually better picture quality especially for consumerapplications which have lower bit rates than professional applications.

As mentioned above, the size of the enhancement layer can be reducedwithout much reduction in picture quality, by inserting empty B-frames(muting frame information) into the enhancement encoder. This can beaccomplished by using the switch 366. The switch 366 can be positionedso that empty B-frames (no DCT-coefficients) and the motion vectors aresupplied to the DCT circuit 368. As a result, the motion vectors areencoded by the enhancement encoder 314.

FIG. 4 illustrates a layered decoder 400 for decoding the layered bitstream produced by the encoder 300 illustrated in FIG. 3. It will beunderstood by those skilled in the art that other layered decoders couldalso be used and the invention is not limited thereto. The layereddecoder 400 has a base decoder 402 and an enhancement decoder 404. Thebase stream from the base encoder is input into the VLD unit 406. TheVLD unit 406 decodes the base stream and supplies the motion vectors toa motion compensator 408. The rest of the decoded stream is supplied toan inverse DCT unit 410. The inverse DCT unit 410 performs an inverseDCT on the DCT coefficients. The resulting signal is supplied to aninverse quantizer 412. The output of the inverse quantizer 412 and theoutput of the motion compensator 408 are added together by addition unit414 to create an SD-output signal 416. The SD-output signal 416 is alsofed back to the motion compensator 408.

The enhancement decoder 404 also contains a VLD unit 418, and inverseDCT unit 420, an inverse quantizer 422, a motion compensator 424 and anaddition unit 426 which operate in a similar manner as the like elementsof the base decoder 402. The enhancement decoder 404 decodes the framesin the encoded enhancement stream, wherein in at least some of theframes the residual signal has been muted while motion information ismaintained in these frames. To create a HD-output, the output of theaddition unit 426 is added to the decoded SD-output signal 416 which hasbeen upconverted by an upconverting unit 428 in an addition unit 430.

According to another embodiment of the invention, some frames areencoded and some frames are skipped (muted) in the enhancement layer anda motion compensating algorithm can be used at the decoder to generatethe enhancement layer for the skipped frames. FIG. 5 is a schematicdiagram of an illustrative encoder 500 which can be used to implementthis embodiment of the invention. It will be understood by those skilledin the art that other encoders can also be used to implement theinvention. The encoder 500 is similar to the encoder 300 described abovewith reference to FIG. 3. Like reference numerals have been used forlike elements and a full description of these like elements will not beprovided for the sake of brevity. The encoder 500 has two switches 502and 504 which are different from the encoder 300. The switch 502 ispositioned to select I-frames or P-, B-frames for encoding by theenhancement encoder 314. The second switch 504 is provided on the outputof the enhancement encoder 314. The switch 504 can be moved back andforth so as to select encoded frames or empty frames for transmission.For example, the switch 504 can be moved after each frame is outputtedso that every other frame in the encoded enhancement stream is coded andthe other frames are skipped (muted). By skipping (muting) frames in theencoded enhancement stream, the size of the enhancement stream can begreatly reduced.

In order to prevent the skipped frames from harming the quality of theresulting picture, a temporal (motion compensated or non motioncompensated) interpolation unit 602 is added to the decoder 600 which isillustrated in FIG. 6. The decoder 600 is similar to the decoder 400 andlike reference numbers have been used for like elements. In thisexample, the base decoder 402 decodes the base stream in a known manner.In addition, the enhancement decoder 404 decodes the encoded frames ofthe enhancement stream in a known manner. The temporal interpolationunit 602 generates an enhancement layer output for the frames which havebeen skipped by analyzing the decoded enhancement stream from theenhancement decoder 404. In addition, the base layer output 416 can alsobe used to enhance the motion estimation in the temporal interpolationunit 602. In addition, the upconverted decoded base stream from theupconverter 428 can also be inputted into the temporal interpolationunit 602. The output of the enhancement decoder 404 is interleaved withthe output of the temporal interpolation unit 602 by selectively movingswitches 604 and 606 back and forth. The output of the switch 604 canbe, for example, the stream IoPoPoP . . . , where o represents the Bframes which were muted in the original residual signal. The temporalinterpolation unit creates frames B′ which are interleaved with theoutput of switch 604 to create an interleaved stream IB′PB′PB′P . . . .The interleaved stream and the upconverted base stream are combined inaddition unit 430 to create the HD-output stream.

The above-described embodiments of the invention enhance the efficiencyof spatial scalable compression schemes by lowering the bitrate of theenhancement layer by muting or partially muting some frames over theenhancement layer. It will be understood that the different embodimentsof the invention are not limited to the exact order of theabove-described steps as the timing of some steps can be interchangedwithout affecting the overall operation of the invention. Furthermore,the term “comprising” does not exclude other elements or steps, theterms “a” and “an” do not exclude a plurality and a single processor orother unit may fulfill the functions of several of the units or circuitsrecited in the claims.

1. An apparatus for performing spatial scalable compression of videoinformation captured in a plurality of frames including an encoder forencoding and outputting the captured video frames into a compressed datastream, comprising: a base layer encoder for encoding a bitstream havinga relatively low resolution, derived from original frames of input videoinformation, to form a base layer; an enhancement layer encoder forencoding a residual signal, the residual signal being the differencebetween the original frames and upscaled frames from the base layer; anda muter for muting information in selected frames in said enhancementencoder.
 2. The apparatus for performing spatial scalable compression ofvideo information as claimed in claim 1, wherein for selected frames,the muter for muting information mutes the residual signal in theenhancement layer encoder while maintaining motion information for theselected frames.
 3. The apparatus for performing spatial scalablecompression of video information as claimed in claim 1, wherein themuter for muting information mutes selected complete frames of theencoded residual signal.
 4. The apparatus for performing spatialscalable compression of video information as claimed in claim 1, whereinthe muter for muting information mutes frame information at regularintervals.
 5. The apparatus for performing spatial scalable compressionof video information as claimed in claim 4, wherein the interval lengthis 2 frames.
 6. A layered encoder for encoding an input video stream,comprising: a downsampling unit for reducing a resolution of the inputvideo stream thereby forming a reduced resolution video stream; a baseencoder for encoding the reduced resolution video stream to form a lowerresolution base stream; an upconverting unit for decoding and increasinga resolution of the lower resolution base stream to produce areconstructed video stream; a subtractor unit for subtracting thereconstructed video stream from the input video stream to produce aresidual signal; an enhancement encoder for encoding the resultingresidual signal from the subtractor unit and outputting an enhancementstream; and a muter for muting information in selected frames in theenhancement encoder.
 7. The layered encoder as claimed in claim 6,wherein for selected frames, the muter for muting information mutes theresidual signal in the enhancement encoder while maintaining motioninformation for the selected frames.
 8. The layered encoder as claimedin claim 6, wherein the muter for muting information mutes selectedcomplete frames of the encoded residual signal.
 9. The layered encoderas claimed in claim 6, wherein the muter for muting information mutesframe information at regular intervals.
 10. The layered encoder asclaimed in claim 9, wherein the interval length is 2 frames.
 11. Amethod for providing spatial scalable compression using adaptive contentfiltering of an input video stream, the method comprising the steps of:downsampling the input video stream to reduce the resolution of thevideo stream; encoding the downsampled video stream to produce a basestream; decoding and upconverting the base stream to produce areconstructed video stream; subtracting the reconstructed video streamfrom the input video stream to produce a residual stream; encoding theresidual stream in an enhancement encoder and outputting an enhancementstream; and muting information in selected frames in said enhancementencoder.
 12. A layered decoder for decoding compressed videoinformation, the layered decoder comprising: a base stream decoder fordecoding a received base stream in the compressed video information; anupconverting unit for increasing the resolution of the of the decodedbase stream; an enhancement stream decoder for decoding frames in areceived encoded enhancement stream in the compressed video informationto create a first decoded enhancement stream, wherein selected framesonly contain motion information; and an addition unit for combining theupconverted decoded base stream and the first decoded enhancement streamto produce a video output.
 13. The layered decoder as claimed in claim12, wherein said layered decoder further comprises: a temporalinterpolation unit for generating a second decoded enhancement streamfor the selected frames in the received enhancement stream; and aninterleaver for interleaving the first and second decoded enhancementstreams into an interleaved enhancement stream.
 14. The layered decoderas claimed in claim 13, wherein non-selected decoded frames are providedto the temporal interpolation unit.
 15. The layered decoder as claimedin claim 13, wherein the decoded base stream is provided to the temporalinterpolation unit.
 16. The layered decoder as claimed in claim 13,wherein the upconverted decoded base stream is provided to the temporalinterpolation unit.
 17. A method for decoding compressed videoinformation received in a base stream and an enhancement stream, themethod comprising the steps of: decoding the base stream; upconvertingthe decoded base stream to increase a resolution of the decoded basestream; decoding frames in the enhancement stream to create a firstdecoded enhancement stream, wherein selected frames only contain motioninformation; and combining the upconverted decoded base stream with thefirst decoded enhancement stream to produce a video output.
 18. Themethod as claimed in claim 17, wherein said method further comprises thesteps of: generating a second decoded enhancement stream for theselected frames in the received enhancement stream using a temporalinterpolation algorithm; and interleaving the first and second decodedenhancement streams to create an interleaved enhancement stream, andwherein said combining step combines the upconverted decoded base streamwith the interleaved enhancement stream to produce the video signal. 19.The method as claimed in claim 18, wherein temporal interpolation isperformed by a natural motion algorithm.